Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TabletManager: Check for context cancellation in loop within ChangeTabletType() #9842

Merged

Conversation

rohit-nayak-ps
Copy link
Contributor

Description

There is currently the possibility of an infinite loop inside ChangeTabletType() because we don't check for context cancellation inside an active polling loop. This PR adds the check.

Related Issue(s)

#9838

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@@ -186,6 +186,9 @@ func (ts *tmState) ChangeTabletType(ctx context.Context, tabletType topodatapb.T
// We must read the data again and verify whether the previous write succeeded or not.
// The only way to guarantee safety is to keep retrying read until we succeed
for {
if ctx.Err() != nil {
return ctx.Err()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had added the infinite loop intentionally, because we enter that code only if a topo write has failed. Once a topo write fails, the only way forward is to keep trying to read the topo server, because the information may or may not have been written.
However, I do understand the use case where they want to retry doing the same operation again, so we need to exit sometime and not wait indefinitely.

I just think we should wrap the returned error and explicitly say that the topo information might be different from the tablet information and that the user must call ChangeTabletType again to fix it.
Other than that, this looks good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think we should wrap the returned error and explicitly say that the topo information might be different from the tablet information and that the user must call ChangeTabletType again to fix it.

Suggestions for the error message? return fmt.Errorf("context canceled updating tablet_type for %s in the topo, please retry", ts.tm.tabletAlias) ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that will work!

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps merged commit 63a4f00 into vitessio:main Mar 11, 2022
@rohit-nayak-ps rohit-nayak-ps deleted the change-tablet-type-fix-infinite-loop branch March 11, 2022 09:39
deepthi pushed a commit that referenced this pull request Apr 4, 2022
… within ChangeTabletType() (#10035)

* Check for context cancellation in loop within ChangeTabletType()

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* Change error message

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants